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1.  Introduction.  Let  Xj.-.-.X  be  a  random  sample  from  a  distribution 
with  unknown  density  f  on  ]R,  and  let 

i  n 

f(x|h)  =  (nh)'1  l  K{ (x-X  )/h) 
n  i=l  1 

be  a  nonparametric  estimator  of  f  based  on  kernel  K  and  window  h. 

The  problem  of  choosing  h  so  as  to  "minimise  error",  in  some  sense,  is 
legion  in  the  theory  and  practice  of  nonparametric  density  estimation. 
Commonly,  the  criterion  used  to  measure  loss  is  mean  integrated  square 
error  (MISE) , 

M(h)  =  /E{fn(xfh)  -  f  (x)}2dx  . 

See  for  example  Rosenblatt  [17].  TTiis  approach  has  its  roots  in  classical 
theory  of  nonparametric  density  estimation,  where  the  window  h  is  taken 
to  be  non- random.  Of  course,  the  value  hQ  which  minimises  M(h)  depends 
on  the  unknown  density  f.  Any  attempt  to  estimate  this  "optimal"  h 
must  result  in  a  window  which  is  a  function  of  the  sample  values.  That 
is,  the  value  of  h  must  in  practice  be  a  random  variable.  Bearing  this 
in  mind,  it  seems  to  us  that  one  should  try  from  the  outset  to  minimise 
integrated  square  error  (ISE), 

A(h)  =  ;tfn(x|h)  -  f(x)}2dx  , 

instead  of  MSE.  If  hQ  (a  random  variable)  minimizes  A, and  (non-random) 
minimizes  M,  then  E{A(hQ)}  _>  E{A(hQ)}.  In  this  sense,  hy  improves  on  hQ. 

Let  h  be  a  "data-driven"  bandwidth,  estimated  from  the  sample  in  some 
way.  Our  aim  in  this  paper  is  to  examine  the  distance  between  h  and  h  , 


and  the  distance  between  A(h)  and  A(h  ).  Of  course,  A(h)  _>  A(hQ). 

We  ask:  how  much  greater  than  the  minimum,  A(hQ),  is  Afh)? 

There  are  at  least  two  approaches  to  constructing  h:  the  classical 

argument,  which  essentially  tries  to  estimate  hQ;  and  least-squares 

cross-validation  (Bowman  [2],  [3];  Rudemo  [19]).  The  cross -validatory 

window  is  that  value  h  which  minimizes 

c 

CV(h)  =/f^(x|h)dx  -  2n_1  £  f^CX  |h), 

i=l 

where  f  (x|h)  =  {(n-l)h)  ^  £  K{(x-X.)/h)  is  the  kernel  density  estimate 

ni  J 

obtained  by  leaving  out  sample  value  X^.  The  intuitive  appeal  of  cross- 
validation  is  that  it  sidesteps  secondary  issues  such  as  theoretical 
properties  of  MISE,  and  goes  straight  to  the  heart  of  the  problem,  by 
minimimizing  an  estimate  of  A(h)  *  / f  .  (Notice  that  CV(h)  is  unbiased 
for  M(h)  -  / f  .)  We  shall  show  that  this  directness  pays  dividends.  In 
a  range  of  situations,  including  the  multivariate  case,  the  difference 

A  A 

between  A  (h^)  and  A  (hQ)  is  of  the  same  order  of  magnitude  as  the  difference 
between  A(hQ)  and  A(hQ),  under  minimal  smoothness  conditions  on  f. 

(The  common  order  is  n  in  this  sense,  the  classical  'host  but  un¬ 

achievable  strategy"  of  using  hQ  is  no  better  than  the  achievable 

strategy  of  least-squares  cross-validation.  Furthermore,  neither  h  nor 

/*» 

h^_  consistently  outperforms  the  other,  since  probabilities 


P{A(hJ  >  A(hQ) } ,  P{A(hJ  <  A(hQ) } 


both  converge  to  strictly  positive  limits. 


» 
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One  class  of  competitors  to  hc  consists  of  two-stage  ("plug-in") 
procedures,  which  aim  to  estimate  the  constant  cq  in  the  asymptotic 
formula  hQ  ~  CQn  ^  (valid  in  one  dimension).  They  cannot  be  expected 
to  perform  better  than  if  the  precise  value  of  hQ  had  been  available. 

They  can  produce  windows  h  for  which  A(h)  -  A(h  )  is  of  a  larger  order 
of  magnitude  than  A(hQ)  -  A(h  ),  depending  on  their  construction  and  the 
extent  of  additional  smoothness  assumptions. 

We  shall  close  this  section  by  relating  our  contributions  to 
recent  work  in  the  area.  Theorem  2.3  of  Rice  (1984)  is  close  to  our 
Theorem  2.1,  but  in  the  context  of  nonparametric  regression.  Asymptotic 
first-order  optimality  of  least-squares  cross-validation  in  density 
estimation  has  been  established  by  Hall  [11],  [13]  and  Stone  [21]; 

Stone's  work  assumes  minimal  conditions  on  f.  Other  forms  of  cross- 
validation  in  nonparametric  density  estimation  have  been  considered  by 
Habbema,  Hermans  and  van  den  Broek  [9],  I)u in  [8],  Chow,  deman  and  Wu  |5|, 
Bowman,  Hall  and  Titterington  [4]  and  Marron  [14],  [15].  Ihe  last  three 
papers  take  quite  a  general  view  of  the  principle  of  cross  validation. 

A  recent  survey  by  Titterington  [22]  sets  cross-validation  .nn  uJitcx: 
as  a  smoothing  technique.  First-  and  second-order  proper;  ,e  t  tr.<- 
diffferencc  between  ISE  and  MISE  have  been  examined  by  Hum  i  ,t..:  ko-en:  E. 
[1],  Rosenblatt  [18],  Csorgd  and  Revest  [0]  (pp.  22s  229)  ana  .'in.. 

[12].  Finally,  we  should  point  out  that  although  1“  mc-cur  -  :  •.  r.  a  . 

such  as  MISE,  are  very  widely  accepted,  there  do  exist  alu  m,.:  , 

t 

examples  include  supremum  measures  (Silvemian  J20J  )  and  1  1  measures 
(Dev rove  and  Gyorfi  [7]). 


I 


-4- 


2.  Results.  For  the  sake  of  clarity  and  brevity  we  shall  state  and 
prove  our  main  results  for  the  case  of  one -dimensional  data,  in  the  context 
of  a  positive  kernel.  Towards  the  end  of  this  section  we  shall  show  that 
the  theorems  are  readily  extendible  to  any  finite  number  of  dimensions, 
and  to  more  general  kernels  which  may  become  negative  in  order  to  reduce 
bias . 

We  impose  the  following  conditions  on  K  and  f: 

(2.1)  K  is  a  compactly  supported,  symmetric  function  on  ]R  with 
Holder-continuous  derivative  K',  and  satisfies 

/  K  =  1,  /  z2K(z)dz  =  2k  +  0. 

(A  function  g  is  Holder  continuous  if  jg(x)  -  g (y) |  £  c|x-y|c  for  some 
c,  c  >  0  and  all  x,y. ) 

(2.2)  f  is  bounded  and  twice  differentiable,  f'  and  f"  are  bounded  and 
integrable,  and  f"  is  uniformly  continuous. 


Define  integrated  square  error  A,  mean  integrated  square  error 
M  a  E(A),  and  the  cross -validatory  criterion  CV  as  in  Section  1.  Set  D  - 
A-M,  and  notice  that  CV  =  A  +  6  -  /f2,  where 


n 


16  /  ff  -  n'1  l  f  .(X.)  . 

1  n  >,  ni  l 

i=l 


Recall  that  h  ,  h  and  h  minimize  A,  CV  ;ind  M,  respectively.  Observe  that 
o  c  o  1 

Mfn)  =  (nh)_1/K2  +  (1-n'1)/  t  |K(z)f  (x-hz)dz)\lx 


-  2  /  f (x)dx  /  K(z)f (x-hz)dz  +  /  f  “  . 


We  may  derive  expressions  for  M' (h)  and  M”(h)  by  differentiating  under 

the  integral  signs  in  this  formula.  In  that  way  we  may  deduce  that, 
with  c 1 =  /  K2  and  k2  / (f")2  we  have 

M(h)  =  c-^nh)  1  +  c?h4  +  o{  (nh)  1  +  h^}  , 

M"(h)  =  2c1(nh3)'1  +  12c2h2  +  oUnh3)'1  +  h2} 

as  h  -*•  0  and  n  -+°°.  Consequently,  hQ  ~  c  n  1//3  where  cQ  =  (c1/4c9) ,  and 

M"(h  )  '  c,n  2'  3  where  c,  =  2c.c  3  +  12c7c  2  Set 
v  o  o  3  1  o  2  o 

L(z)  =  -zK’(z), 

Jo  £  (2/co)3  ^  /I/K(y+z)^(z)  -  L(z)}dz]2dy 

+  (4kcQ) 2{ / (f")2f  -  (/f"f)2}  , 

ct,2  =  (2/co)3(/f2)(/L2)  +  (4kco)2{/(f")2f  -  (/f-f)2}  . 

The  structure  of  our  arguments  is  very  simple,  and  so  we  shall 
prove  our  main  results  here.  The  lemmas  in  Section  3  supply  all  the  rigour 
needed,  and  we  shall  refer  to  them  as  required. 

First,  we  prove  a  limit  theorem  for  h  -  h  .  Observe  that 

(2.3)  0  =  A'(h  )  =  M’(h  )  +  D' (h  )  =  (h  -h  )M”(h*)  +  D' (h  )  , 

~  '  -1/5- c 

where  h*  lies  inbetween  h  and  h  .  By  Lemma  5  3,h  =  h  *-  0  (n  ")  for 

o  o  ’  o  o  p  ' 

some  i  >0,  and  so  by  Lemma  3.2  (with  hj=hQ),  D'  (hQ)  =  D'(hQ)  +  °p(n 

Rut  Lemma  3.4  declares  that  n'  (h  )  X(0,:-p,  and  so  n' (hQ) 

must  have  the  same  weak  limit.  Since  h*/h  C  l ,  it  is  easily  shown  that 

o 

M"(h*)  =  c-n  +  o  (n  2^3) .  Combining  the  estimates  from  (2.3)  down, 
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we  conclude  that 

(2.4)  n3/10(ho-ho)  -  N(0, o3c3'2)  . 

/\ 

Next  we  prove  a  limit  theorem  for  hc  -  h  .  Notice  that 

(2.5)  0  =  CY'(hc)  =  M'(hc)  +  D’(hc)  +  «’(hc) 

=  (hc  -hQ)M"(h*)  +  D'(hc)  +  6'(hc)  , 

A 

where  on  this  occasion  h*  lies  inbetween  h  and  h  . 

o  c 

Using  Lemmas  3.2  and  3.3  in  the  same  manner  as  before,  we  find  that 

D'(hc)  +  5'(hc)  =  D'(hQ)  +  6'  (hQ)  +  op(n  ' .  Lemmas  3.4  and  3.5  imply 

that  D'(h)+6'(h)=0(n  ‘  .  Since  h*/h  5  1,  it  is  easily  shown 

o  o  p  o 

that  M"(h*)  =  c^n  2//3  +  op(n  2,/5 ) .  Using  these  results  in  (2.5),  we  find 
that 

0  =  (h.  -  ho)c3n'2/5  (1  +  op(l)}  +  Op(n'7/10)  , 

and  so  h  -  h  =  0  (n  3/^).  This  means  that 
cop 

(h  -  h  )M"(h*)  =  (h  -  h  )c,n'2/5  +  o  (n~7/10)  , 
c  o'  v  J  y  c  o  3  pv  J  ’ 

and  so  we  may  refine  (2.5)  as  follows: 

°  ■  (N  -  vc3n‘2/s  *  d'<v  *  4'(v  *  yn'7/10j  • 

We  already  know  from  the  previous  paragraph  that 

0  =  (h  -  h  )c,n'2/j  +  D’(h  )  +  o  (n'//10)  . 
v  o  o  3  o'  p 

Subtracting : 

0  =  (h  -  h  )c.n'2/5  +  V(h  )  +  o  (n'//10)  . 

C  O  a  op 


This  result  and  Lemma  3.5  entail 


(2.o)  n5/10(hc  hQ)  ^  N(0,o^c32)  . 


We  pause  to  combine  (2.4)  and  (2.6)  into  a  theorem. 


THLORHM  2.1.  Under  conditions  (2.1)  and  (2.2), 

?/10~  ,  ,  D  2  -2,  ,  3/10  S'  Z  ,  D  2  -2, 

n  ^o  '  V  N(°»a0c3  )  and  n  Oc  '  h0)  ^  N^°»°cC3  ^  • 

Having  derived  these  formulae,  it  is  only  a  short  step  to  describe 
the  amount  by  which  hQ  and  h^  fail  to  minimize  integrated  square  error. 
For  that  purpose  we  impose  an  additional  condition  on  K: 

(2.7)  K  has  a  second  derivative  on  IR,  and  K"  is  Holder  continuous. 

A 

Let  h  denote  either  h  or  h  ,  and  notice  that 

o  c 


A(h)  -  A(hJ  =  i(h-i )2A"(h*)  , 


where  h*  lies  inbetween  h  and  h  .  In  view  of  Lemma  3.6  ;ind  the  fact 

o 

that  h*/hQ  5  1,  A" (h*)  =  M"(h*)  +  But  M"(h*)  =  c3n'Z/5  + 

o  (n  “^) ,  and  so,  since  h-h  =  0  (n  ^^) 
p  ;  ’  op 


2  „  „‘2/5 


A(h)  -  Afh  )  =  i  (h-h  )  c_n 
'  o  o  .1 


-  -1, 
o  (n  )  . 
p 


Our  next  result  is  now  immediate  from  Theorei 


IlUiORLM  2.2.  ,r:.:<jp  cona «. t  :  ons  »  2 .  1  i , 


n-  /  ih  1  -  v  (h  , 

O  u 


REMARKS . 


ni'I'K  tor 


h ;  gher  -1 1  met.  •  ;■  n  a  ;.ita  ,  a  1 1  - 1 T  * .  » 


■  i  1 1  (  >t  .it 
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In  the  case  of  p  dimensions  we  should  define  L  by 

hU) 


-P 


i  =C1)K, (-)  , 


1=1 


where  z  =  (z^  , .  . .  ,z^)  and  lh(z)  =  (Viz ^ )K(z) .  We  assume  p- 

uirac.Tu<  ional  versions  of  V.l),  (2.2)  and  (2.~)>  and  define 

;  i )  2  •  1  7  -  >  ' 


Co  (.P<-']/4c2'> 


K(z)dz  (not  depending  on  i),  :  j K“,  c,  :  ki'/(y1T)“, 

1/(P+4)>  p(p+i)ClcoMp+2^  +  12c2c2  , 


-  8p2coP*2(/f2)  / l /K(y+z) {K(z)  -  L(z)jdz]2dy 
+  (4kCQ)2{/f(V2f)2  -  (/fV2f)2;  , 


•2  8p2coP_2(/f2)(/L2)  +  (4kco)2t/f(y2f)2 


(/fv2f) 


2 , 

“}  . 


Theorem  2.2  holds  as  before,  and  the  only  change  to  Theorem  2.1  is  that 
the  factor  n''210  should  be  replaced  by  n ^P+“V- (P+4)  _ 


2.2.  J.'k-  :>■.:!  kw.,:!.-:.  The  forms  of  Theorems  2.1  and  2.2  remain  unchanged 

if  we  admit,  more  general  kernels.  To  illustrate  this,  we  shall  confine 
a  ro  the  case  p=l.  Higher  dimensions  may  be  treated  similarly, 

i:  K  is  chosen  so  that  / K=1  and  for  some  integer  t  -2, 

z-l\i.z)dz  =  0  for  1  <  j  <_  t-1,  jztK(z)dz  t  0, 


tnen  tine  kernel  i  a  1  so  enjoy.-  these  properties.  A  version  of  Theorem  2.1 
holds  m  which  n2/  1,1  is  replaced  by  n22“  ^ " 1  +  *  \  .ind  Theorem  2.2  holds  as 


;vt  oiv . 


2.o.  j  .'V'  ■  eririur/.o.  for  the  sake  of  s  impl  ic  i  ty  we  .-hall 

confine  attention  to  the  case  of  positive  kernels  and  one -d i mens iona 1 
lata.  WV  shall  adhere  to  our  convention,  uiscussed  in  Section  1,  that 
"better"  window.,  ii  are  tnose  which  give  smai  ler  integrated  square  error. 


4-,  t- 

LENM\  3.0.  Under  condition  (2.1),  (2.2)  and  (2.7),  :xnd  for  ar.j 
0  -  a  <  b 

(3.25)  sup  |D"(n'1/St)|  =  o  (n'2/E>). 
a<t<b  p 

PROOF.  First  derive  an  analogue  of  (3.1),  using  an  almost  identical 
a  rgument : 

i/9  -1/5  i 

sup  E  n  •/‘'D"(n  '  t)j  £C(a,b,2.). 

n ;a<t<b 

Then  follow  the  proof  of  Lemma  3.2,  to  conclude  that  (5.25)  holds,  and 
in  fact  the  right-hand  side  equals  0^(n  “  u)  for  some  c  >  0. 


v-x  =  -k  h“  //  { k l'"ixji'(y)dy  dx  +  o(h  ) 

=  -2k  h3  /  f’f  +  o(h3) , 

v*2  =  (k  h2)2  f[/{K(^)  +  L(-~^)}f"(x)cLx]2f(v)dy  +  o(hb) 
=  4k2h°  /  (f")2f  +  o(h°). 


Result  (3.21)  now  follows  from  (3.23). 
im  3.5.  n'/10i'  (h  )  ^N(0,o2). 


PROOF.  The  martingale  methods  ;ind  Cramer-Wold  device  used  to  prove 
i  emma  3.4,  are  also  applicable  here.  The  argument  is  based  on  (3.6) 
instead  of  (3.3).  We  shall  prove  only  the  analogue  of  (5.20): 

5.24)  n^/;ivar  (Tj  J  ^c'V/f2)  /lK-L)2. 

9/5 

.he  analogue  of  (3.21),  which  declares  that  n  var(T,)  converges  to  th 
a::.i  1  ; n. l r  as  in  (3.21),  follows  as  before. 

prove  ( 3.24 , ,  not  ice  that  with  B  =  B.  -  3-,,  b  -  bj  -  if  ;md 


vai 


2-  nui-  1  ih  .  “n(n- 1)1:'  B •  X ^  ,X  ,)  -  b(.Xj )  -  b(.\->)  + 


_•  a ,  n 


\.  ,X  , 


2  b“  (  X  ,  ) 
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Estimates  of  this  type  give: 
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If  we  write  S  =  7  T  A(X.,X.)  and  S9  =  Y  a(X.),  then  b{A(X. ,X. ) |X . j  =  0 
1  j<i  1  J  2  i  i  i  J  1  J 

almost  surely  for  each  j  <  i,  and  so  far  any  real  c  and  d,  the  variables 
i-1 

Y.  s  c  y  A(X.  ,X . )  +  d  a(XJ,  1  <  i  <  n, 
i  l  j  i  —  — 

are  zero-mean  martingale  differences  with  respect  to  the  -fields 

n 

F{X1 , . . . ,X. } .  In  this  sense,  c  S,  +  d  S,  -  7  Y.  is  a  martingale. 

1  i  1  2  >i  l 

The  argument  leading  to  Hall's  [12]  Theorem  1  shows  that  c  S ,  +  d  S , 

!  ) 

is  asymptotically  normally  distributed  with  variance  c“  var»b  )  +  d“  var(S,.' 
This  property,  together  with  the  Cramer-Wold  device,  permits  us  to 
complete  the  proof  of  (3.19)  by  showing  that 


(3.20) 

n9//5var(S  )  +  2  c  \ 
1  o  v 

(3.21) 

n9//5var(S-,)  4  k2c^ 

Yjfx.y)  =  E[(K(-^-X-)  -  EK(^)KK(^)  -  1:K(^);], 
o  o  o  1  o 


Y2(x,y)  =  E[{L(^)  -  el Hi. (^r— )  -  El.(^p-)M, 

o  no  o  o 


V3(x,y)  =  E [  {  K (-r— )  -  EK (4—)  >  i L (^—)  -  EL(^)}] 

uo  O  o  0 


iind  v4(x,y)  =  v  (y,x).  Then 


varfS^  =  (nhQ)  n(n-l)  //  (2-y  “  -  2y  ]  ,  -  -  2',^  +  %  r:  +  ■■y4) 

The  functions  v  are  covariances,  and  cucn  may  be  expressed  in  the  fonn 
b(UY)  -  L(U)H(Y)  for  variables  U  and  Y.  A  little  algebra  shows  that 
the  term  K(U)L(Y)  makes  a  negligible  contribution,  and  in  fact 
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Also  by  Lemma  3.2,  A '  0^o)  =  D'  (hQ)  =  0p(n  “)>  and  so 

(3.18)  Op(n'3/5'e)  =  M'(ho)  -  M'  (hQ)  =  (ho-hQ)M"(h*) , 

-  -2/5 

where  h*  lies  inbetween  h  and  h  .  As  in  Section  2,  M"(h*)  =  c,n 

o  o 

+  o  (n  “y|/3) .  Using  this  estimate  in  (3.18)  we  conclude  that 
hQ-ho  =  0p(n  1/5~t'),  as  required. 

/N  S\  p 

To  treat  |h  -hQ  | ,  notice  that  hc/hQ  ->  1.  Therefore 

CV'(h  )  =  CV'(h  )  -  CV'(hJ  =  A’(h  )  -  A’(h)  +  6 '  (h  )  -  £'(hj 
o  o  c  o  c  U  ^ 

=  M'(ho)  -  M'(hc)  +  Op(n'3/5'e), 

" 3/ 5 “ c 

again  using  Lemma  3.2.  But  CV'(ho)  =  M'  (hQ)  +  °p(n  ) ,  and  so  as  before 

it  follows  that  h  -h  =  0  (n  c). 
o  c  p 

LBM-\  3.4.  n'/10D'(ho)  2n(0,o^). 


PROOF.  We  shall  start  from  decomposition  (3.5),  and  prove  that 
n9^10Dj(h  )  ^  N(0,c^/4).  Now,  the  argument  leading  to  (3.9)  gives 

L{S^(h  )}  =  0(n'13//5),  and  so  S,(h  )  =  o  (n'9//1°) .  Therefore  by  (5.5), 
3  o  jo  p 

it  suffices  to  show  that 


(3.19) 


(n9/10S 


r 


n9/10S, 


)  5  (Z 


l’“2" 


where  S  =  S.  (h  )  and  Z.  and  Z7  are  mdepenuent  normal  variables  with 
i  i  o  1  *- 

2  2 

zero  means  and  variances  adding  up  to  CQao/4. 

Our  route  to  (3.19)  uses  the  argument  of  Hall  [12J,  and  so  we 
omit  many  details.  The  variables  and  S7  are  uncorrclated. 
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For  a  <  lim  n^^h..  <  b,  suppose 


-1/5-09 

h  -n  =  tn  <  t,  <  . . .  <  t  ,  <  h  +  n 

1  0  1  m-1  —  1 


1/5 -c. 


<  t  , 
m 


where  t.  -  t.  ,  =  n  a  for  each  i.  In  view  of  (3.16),  to  finish  the  proof 
l  l-l 

of  (3.15)  it  suffices  to  check  that 

sup  n7/10|D’(n'1/5t.)  -  D’(n~1/5t.)|  +  0, 

(ti,tj)er  1  J 

- 1  /  5  -  e  2 

where  T  is  the  set  of  all  pairs  (t.,t.)  with  0<t.-t.  <  n  “  and  i  <  m. 


i  y 


i  j  - 


For  any  n > 0 , 


P{  sup  n7/1° | D ' (n"1/5t. )  -  D' (n'1/5t . ) |  >  n) 
(ti,tj)eT  1  J 

(3.17)  <  l  Etn'V^lD'tn-1^.)  -  D' (n'1/5t . )  | } 

(t^t^eT  1  -1 

_2 o  2 (ot- ^2 ~  1/5)  "  1/ 5*^2 

^  C  n  n  (n  )  , 


using  (5.3)  and  the  fact  that  the  number  of  elements  in  T  is  of  order  n 
By  choosing  £  sufficiently  large  we  may  ensure  that  the  term  in  (5.17)  converges 
to  zero  as  n+^.  This  proves  (3.15).  A  similar  partitioning  argument  may 
be  used  to  prove  (3.14). 


LEMMA  3.3.  Fov  so'nc  o  >  0, 


'h  -h  . 
1  o  o' 


+  |h  -h  !  =  0  (n  ^ 
'  c  o1  p 


t.  - 

J  ‘ 


PROOF.  First  we  treat  ]h  -hQ|.  It  is  not  difficult  to  prove,  using 
techniques  of  Hall  [11]  (p.  1160),  that  n  /h^  5  \ .  Therefore  by  Lemma  3.1, 

1'  (h  )  =  A '  (h  )  -  A '  (h  )  =  M’('h  )  -  M’(h  )  +  0  (n  ^“  ). 
o  o  o  o  o  p 
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and  where 

|WsCi)  -  Wt(i) 1  <  Cn1/5 | s-t  j E . 

Hence,  for  m=2, . . . ,21, 

[cumm(n9//l0iS21(n  1//5s)  -  S21(n  1//:5t)})|  <_ 
iCn-9m/10*lW5|s_t|e,n 

This  completes  the  proof  of  (3.9)  and  hence  that  of  (3.3).  The  same 
type  of  argument  may  be  used  to  prove  (3.1),  (3.2)  and  (3.4). 

L&MA.  3.2.  For  some  e  >  0  and.  any  0  <  a  <  b  <  00 , 

(3.14)  sup  {|D'(n*1/5t)|  +  | 5 • (n  1/5t) ] }  =  0 (n_3/5'e)  . 
a<t<b  p 

Furthermore,  for  any  >  0  and  any  non-random  hj  asymptotic  to  a  constant 
multiple  of  n  ^  ^ , 


(3.15) 


t-n 


sup 

1/5 


hii  i 


n 


n7/10{|D’(n'1/5t)  -  D* (h  ) |  + 

'C2 


6' (n'1/5t) 


(h2) j }  K 


PROOF.  We  give  a  proof  only  for  D'.  The  proof  for  6'  is  similar.  To 
check  (3.15),  note  that  using  the  decomposition  (3.5)  of  D',  the  Holder 
continuity  of  K  and  L,  and  the  fact  that  both  of  these  functions  have 
compact  support,  there  is  an  a >  0  sufficiently  large  that 


sup 

a<s <t<  2b 

,  I  -cx+1/5 
s  - 1  <n 


|D'(n'1/5s)  -  D’(n'1/5t)i  =  Oin'1)  . 


(3.16) 


and  where 


H[Vt(i)]  =  0, 


-2/5  c 

(3.13)  |Vs(i)  -  Vt (i) |  <  Cn  |s-tj  . 


By  a  curnulant  expansion  of  the  2£-th  moment,  to  show  (3.8)  it  is  enough 
to  check  that  for  m=2,...,2£. 


f  ,n 


|cumm(n9/10{S21(n  1/5s)  -  S21  (n_1/5t)})  |  <  C | s-t | 


where  cumm(4)  denotes  the  m-th  order  curnulant.  But,  by  the  independence 


property  of  cumulants, 


|cumrn(n9/10{S21(n'1/5s)  -  S21  (n'1/5t) }) 


m 


=  n 


-m/10 


n 


1  cum  (V  (i)  -  V  (i)) 
i=l  m  s 


<  Cn1‘m/10[n'2/5|s-t|c] 


m 


-  Cn1-m/2|s-t,Cm 


where  the  inequality  follows  from  (3.13).  This  completes  the  proof  of  (3.8) 
The  verification  of  (3.9)  is  quite  similar  to  that  of  (3.8)  so  only 
differences  will  be  noted.  Write 

S-7  1  (n~  1/bt)  =  n'2  l  W.  (i), 
i=l  r 

where 

E[W.(i)J  =  0, 


Hence , 
(3.11) 
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X.-X. 

ius(i,j)  -  Ut(i,j)|  <  C|s-t|C{  (n'1/5b)~1l^.2>2i  (~~1^~)  +  12} 


E[n9/10{Sn(n'1/5s)  -  Sn(n'1/5t) }] 


(3.12) 


=  n'1U/5  1  l  ...  I  l  E[{UsUl>h)  -  Ut(i1,j1)>... 


1l<-’l  X2J,<J2£ 


’-*2p^  ' 


Rearrange  the  terms  on  the  right  side  of  (3.12)  into  4£  groups  where  the 

term  indexed  by  • .  ^2.1^21  is  pUt  ^  the  m'th  gr°UP  When  there  arG 

exactly  m  distinct  integers  in  the  list  Note  t*iat  the 

cardinality  of  the  m-th  group  is  bounded  by  Cnm,  and  by  (3.10),  each  term 
is  0  in  the  groups  2£+l,...,  4£.  Hence,  by  (3.11)  and  integration  by 
substitution, 

E[n9/10{Sn(n'1/5s)  -  S n(n'1/5t)}] 


£  cin 


-1U/5  nmjs-t|2££  n2£/5'm/10 
m=2 


<  C2|s-t 


and  the  proof  of  (3.7)  is  complete. 

To  verify  (3.8),  note  that  by  Taylor’s  theorem,  (2.1)  (2.2)  and  the 

fact  that  I.  is  also  symmetric  and  integrates  to  1,  tor  t  e  (a,b) 
|2Efn(x|n'1/St)  -  Egn(x|n'1/5t)  -  f(x)|  <  Cn'2/S  . 


Hence  S7^  may  be  written 


S?](n  /St)  =  n'1  l  V  (i), 
21  i=l  1 


T  2{n(n-l)h}~1  £  [  {B  (X  X  )  -  b  (X  )  -  b  (X  )  +  a 

*  *  J  11  x  J 

T2£  =-  (nh)'1  J  {b  (X  )  -  -  f(X.)  +  ff2}, 

i=l 

Bj  (x,y)  =  K{(x-y)/h},  B7(x,y)  =  Li(x-y)/h},  b.(x)  =  EiHf(x,X)i,  a, 

=  Hibe(X)}. 

To  prove  (3.3)  we  shall  show  that  for  some  e > 0, 

(3.7)  E|n9/10{Su(n-1/5s)  -  sn(n"1/5t) }  |  2i  <  C|s-t|c£  , 

(3.8)  E|n9/10{S21(n'1/5s)  -  S21  (n~  1/5t) }  |  2e  <  C |  s-t  | c i  , 

(3.9)  E|n13/10{S31(rf  1/5s)  -  S31 (n‘ 1/5t) } | 2£  <  C | s-t | e fc . 

Similar  inequalities  may  be  established  for  the  functions  an 

’ 

To  verify  (3.7),  note  that  may  be  written  as 

Su(n'1/5t)  =  n  '2  l  l  U  (i  ,  j ) 
l<i<j<n 

and  Ut(i,j)  satisfies 

(3.10)  E[Ut (i ,j ) |Xi]  =  E[Ut(i,j)|X  ]  =  0. 


By  the  compactness  of  support  ^vhich  without  loss  of  generality  may  be 
taken  to  be  [-1,1])  and  the  Holder  continuity  of  K,  for  s,te  (a,b), 


(3.5) 


D-^Ch)  =  -  (h/2)  D  ’  (h)  -  S1(h)  +  S2(h)  +  S3(h): 


where  SL  -  S12,  S2  _  S21  +  S22,  -  S31  -  S32> 


Sn  s  2  (nh)  "2  l  l  /{ KK-i)  -  EK(^)}{K(^) 


l<i<j  <n 


EK(^)}dx, 


Sl7  =  (nh)  *■  ll  /[(K(-t— i)  -  EK(V— ) } {L(  '  ^)  EL(-r—)} 


x-X.  Y  x-X  .  y 

1\  ri'/X"'M  i  ri  c  i  n  ,  X  "  A-, 


12 


l<i<j  <n 


TT" 


+  (K^1)  '  EL(^)}{K(i~i-)  -  EK(~)}]dx, 


S21  =  (nh)'1  J  / (K(-j~-)  -  EK(^)}{2Efn(x|h)  -  Eg^h)  -  f(x)}dx, 


-1  11  X-X  y 

<  r  rrt  /•  t-t 


s22  S  (nh)  1  l  -  EL(^)}{£(X)  -  Eyx|h)}dx, 


S31  =  (nh)'2  l  / [ ( K (— j~^)  -  HK(^)}2  -  E(K(^)  -  EK(^))2]dx, 


i=l 


?  n  X ” X  *  Y  v  X-X.  x 

S32  =  (nh)  L  ^  /[{ K(— -i)  -  EK(^)HL(-^-i)  -  EL (-pp) t 


X-X.  y  X-X  v - Y 

-  E(K(-^)  -  EK(^p)}{L(-s-i)  -  EL(~))ldx. 


A  similar  argument  produces  the  decomposition 


(3.6)  61(h)  =  (h/2) 6 1  (h)  =  ^(h)  +  T2(h)  , 

where  Tj  -  T^'T^’  T2  =  T21-T22  , 


3.  Lemmas.  The  lemmas  below  were  required  for  the  proofs  of  Theorems 
2.1  and  2.2.  In  Lemmas  3. 1-3.5,  we  assume  conditions  (2.1)  and  (2.2). 


The  symbols  C,  C1  anc  LL  denote  generic  positive  constants. 
LI3MA  3.1.  For  each  0  <  a  <  b  <  00  and  all  positive  integers  l, 


(3.1) 

sup 

E|n7'/10D'(n'1/:’t) 

|2£  <  Cjfa.b.a), 

n;a<t<b 

(3.2) 

sup 

E|n7/106’ (n'1/5t) 

|2£  <  C1(a,b,£) 

n;a<t<b 

Furthermore,  there  exists  >  0  such  that 

c  5/ 

(3.3)  E|n7/10(D’ (n'1/5s)  -  D' (n~1/5t) } | 21  <  C2(a,b,i) | s-t |  1  , 

0  5, 

(3.4)  E|n7/1C(6' (n'1/5s)  -  6  '  (n'1/5t) }  | U  <  C2 (a,b,S. )  | s-t  |  1 
whenever  a  <  s  £  t  _<  b. 

.  1 

PROOF.  We  begin  by  decomposing  D'  and  6'.  Let  gn(x|h)  =  (nh)  \  L{(x- 

i 

and  observe  that 

-(h/2)A'(h)  ■  /(fn-f)(fn-gn) 

‘  /'fn'Efn)2  '  /<yEfnH8n-ESn> 

*  /(fn-Efn)(2Efn-Egn-f)  *  /(g^Kf-Ey 

*  /(K„-f)CEf„-Egn)  ■ 

By  expanding  / (f^-Ef^) ^  as  a  sum  of  integrals  of  squares  plus  a  sum 
of  integrals  of  products,  and  expanding  j ( f^-El^) (g^-Eg^)  in  a  similar 
way,  we  conclude  that 


y)/h) 


probability  one.  It  is  easily  shown  that 


(ns/10(ho-io). 


n3/10m 


ho» 


(2 


l’“2J 


say,  where  (Z ,Z 2)  has  a  joint  normal  distribution 
Consequently,  the  limit 


lim  P(A(h  )  >  A(h  ) } 
o  c 

n-x» 


with  P( 


exists,  and  is  strictly  positive. 
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Let  h  be  a  window  satisfying  h/h  £  1.  Assume  conditions  (2.1), 

(2.2)  and  (2.7).  Using  the  argument  leading  to  Theorem  2.2,  we  obtain: 

(2.8)  0  <  A(h)  -  A(h0)  =  Kh-ho)2c3n'2/5(l  +  o  (l)j. 

We  shall  consider  various  possibilities  for  h. 

(i)  We  might  explicitly  estimate  the  constant  cq  in  the  asymptotic 

formula  hQ  ~  c^n  and  take  h  to  be  the  resulting  window.  This 

requires  estimation  of  / (f")  ,  perhaps  by  integrating  the  square  of  a 
kernel  estimate  of  f".  Such  an  approach  is  really  a  global  version  of 
Woodroofc's  [23]  two-stage  procedure.  Under  the  smoothness  assumption 

(2.2) ,  the  rate  of  convergence  of  such  an  estimator  can  be  slower  than 

-c  ~  2 

n  for  any  given  e  >  0.  In  consequence,  the  error  (h-hQ)  may  converge 

to  zero  in  probability  at  a  rate  slower  than  n  2/5~2c,  and  by  (2.8), 

A(h)  -  A (hQ )  may  be  no  smaller  than  order  n  4//5~2e.  On  the  other  hand, 
if  h  is  the  cross -validatory  window  hc  then  A(h)  -  A(h^)  is  as  small  as 
n  1  under  the  minimal  condition  (2.2). 

(ii)  The  procedure  outlined  in  (i)  is  motivated  by  a  desire  to  estimate 
h  .  Following  that  philosophy,  we  would  be  doing  extremely  well  if  we 
actually  knew  the  value  of  h  .  But  according  to  Theorem  2.2,  even  if  we 
took  h  =  hQ  wre  would  hardly  do  any  better  than  using  the  cross -validatory 

A 

window  hc,  since  in  both  cases  the  distance  of  integrated  square  error 
from  the  minimum  would  be  order  n  ^ . 

(iii)  If  K  is  a  positive  kernel  then  by  the  Cauchy - Schwa rt z  inequality, 

2  7  ,  j  ,  ■> 

(This  is  true  in  any  dimension.  Notice  that  ji.“  =  j(K-I.)4'.) 

In  this  sense,  taking  h  -  hQ  does  result  in  a  marginal  improvement  over 
cross-validation.  However,  the  improvement  is  not  available  with 


vx  =  -k  h2  //  (K(^)  +  L(^)}  f"(x)f(y)dy  dx  +  o(V) 

=  -2k  h3  /  f"f  +  o(h3), 

v2  =  (k  h2)2  /[/{K(^)  +  L(^)}f"(x)dx]2f(y)dy  +  o(h6) 
=  4k2h6  /  (f")2f  +  o(hb). 


Result  (3.21)  now  follows  from  (3.23). 

LENMA  3.5.  n7/10<5 '  (h  )  £N(0,a2) . 

PROOF.  The  martingale  methods  and  Cramer-Wold  device  used  to  prove 
Lemma  3.4,  are  also  applicable  here.  The  argument  is  based  on  (3.6) 
instead  of  (3.5).  We  shall  prove  only  the  analogue  of  (5.20): 

(3.24)  n9/5var(T1)  -*  2  c^(/f2)  /(K-L)2. 

9/5 

The  analogue  of  (3.21),  which  declares  that  n  var(T7)  converges  to  the 
same  limit  as  in  (3.21),  follows  as  before. 

To  prove  (3.24),  notice  that  with  B  =  Bj  -  B-,,  b  =  b^  -  b-,  and 

U  =  Ui  -  U2, 

var(T:)  -  2{n(n-l)hoj'2n(n-l)E(B(X1,X2)  -  b(Xj)  -  b(X£)  +  a)2 
=  2{n (n-l)h2) _1  E(B2(X1,X,)  -  2  b2 (X j )  +  p2} 

~  2n"V2  E(Bt(X1,X2)} 

=  2n  V"  / / { K -  L(^-m“  f(x)f(v)dx  dy 

^V'c/f")  f[K~ L)2- 


LBvMA.  3.6.  Under  conditions  (2.1),  (2.2)  and  (2.7),  and  for  any 
0  <  a  <  b  <  °°, 


(3.25)  sup  |D"(n  ^t)  |  =  o  (n  ^). 
a< t<b  p 

PROOF.  First  derive  an  analogue  of  (3.1),  using  an  almost  identical 
argument : 

2& 

sup  E|n1/2D"(n'1/5t) j  <C(a,b,H). 
n;a£t£b 

Then  follow  the  proof  of  Lemma  3.2,  to  conclude  that  (3.25)  holds,  and 
in  fact  the  right-hand  side  equals  0  (n  £)  for  some  e>  0. 
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